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Abstract. The purpose of this paper is to introduce an algorithm that 
can detect the most unusual part of a digital image. The most unusual 
part of a given shape is defined as a part of the image that has the 
maximal distance to all non intersecting shapes with the same form. 
The method can be used to scan image databases with no clear model 
of the interesting part or large image databases, as for example medical 
databases. 



1 Introduction 

In this paper we are trying to find the most unusual/rare part with predefined 
size of a given image. If we consider an one-dimensional quasi-periodical image, 
as for example electrocardiogram (ECG), the most unusual parts with length 
about one second will be the parts that correspond to rhythm abnormalities [B] . 
Therefore they are of some interest. Considering two dimensional images, we can 
suppose that the most unusual part of the image can correspond to something 
interesting of the image. 

Of course, if we have a clear mathematical model of what the interesting part 
of the image can be, it would be probably better to build a mathematical model 
that detects those unusual characteristics of the image part that are interesting. 
However, as in the case of ECG, the part that we are looking for, can not be 
defined by a clear mathematical model, or just the model can not be available. In 
such cases the most unusual part can be an interesting instrument for screening 
images. 

To state the problem, we need first of all a definition of the term "most 
unusual part" . Let us chose some shape S within the image A, that could contain 
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that part and let us denote the cut of the figure A with shape S and origin r by 
Asip;r), e.g. 

As{p;r) = Sip)A{p + r), 

where p is the in-shape coordinate vector, r is the origin of the cut As and we 
used the characteristic function S{.) of the shape S. Further in this paper we 
win omit the arguments of ^5. We can suppose that the rarest part is the one 
that has the largest distance with the rest of the cuts with the same shape. 

Speaking mathematically, we can suppose that the most unusual part is lo- 
cated at the point r, defined by: 

r = argmax min ||As(r) — ^^(r')!!. (1) 

r r':|r'— r|>diam(S) 

Here we assume that the shifts do not cross the border of the image. The norm 
1 1 . 1 1 is assumed to be L2 nor nQ 

Because the parts of an image that intersect significantly are similar, we 
do not allow the shapes located at r' and r to intersect, avoiding this by the 
restriction on r' : \r' — r\ > diam(S'). 

If we are looking for the part of the image to be rare in a context of an 
image database, we can assume that further restrictions on r' can be added, for 
example restricting the search to avoid intersection with several images. 

The definition above can be interesting as a mathematical construction, but if 
we are looking for practical applications, it is too strict and does not correspond 
exactly to the intuitive notion of the interesting part as there can be several 
interesting parts. Therefore the correct definition will be to find the outliers of 
the distribution of the distances between the blocks 1 1 . 1 1 . 

If the figure has N'^ points, and 1 15*1 1 <C 1 1^| | , in order to find deterministically 
the most interesting part, we need N'^ operations. This is unacceptable even for 
large images, not concerning image databases. Therefore we are looking for an 
algorithm that provides an approximate solution of the problem and solves it 
within some probability limit. 

As is defined above in Eq.ll}, the problem is very similar to the problem 
of location of the nearest neighbor between the blocks. This problem has been 
studied in the literature, concerning Code Book and Fractal Compression [1]. 
However, the problem of finding r in the above equation, without specifying r', 
as we show in the present paper, can be solved by using probabilistic methods 
avoiding slow calculations. 

Summarizing the above statements, we are looking for an algorithm that two 
blocks are similar or different with some probability. 



Similar results are achieved with Li norm. The algorithm was not tested with Lmax 
norm due to its extreme noise sensitivity. We use L2 because of its relation with 
PSNR criteria that closely resembles the human subjective perception. 
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Fig. 1. The original test image. X- 
ray image of a person with ingested 
coin. 



Fig. 2. The distribution of the pro- 
jection value for square shape with 
a size 48x48 pixels. 



2 The Method 
2.1 Projections 

The problem in estimating the minima of Eq. ([T]) is complicated because the 
block is multidimensional. Therefore we can try to simplify the problem by 
projecting the block B = As{r) in one dimension using some projection operator 
X. For this aim, we consider the following quantity: 

b^\X.Bi-X.B\ = \X.{Bi~B)\, \X\^1. (2) 
The dot product in the above equation is the sum over all p-s: 

X.B = Y,X{p)B{p;r). 
p 

If X is random, and uniformly distributed on the sphere of corresponding di- 
mension, then the mean value of b is proportional to \Bi — B\; (b) = c\Bi — B\ 
and the coefficient c depends only on the dimensionality of the block. However, 
when the dimension of the block increases, the two random vectors — B and 
X) are close to orthogonal and the typical projection is small. But if some block 
is far away from all the other blocks, then with some probability, the projection 
will be large. The method resembles that of Ref. [4] for finding nearest neighbor. 

As mentioned above we ought to look for outliers in the distribution. This 
would be difficult in the case of many dimensions, but easier in the case of one 
dimensional projection. 

We will regard only projections orthogonal to the vector with components 
proportional to Xo{p) = l,Vp. The projection on the direction of Xq is pro- 
portional to the mean brightness of the area and thus can be considered as not 
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SO important characteristics of the image. An alternative interpretation of the 
above statement is by considering aU blocks to differ only by their brightness. 
Mathematically the projections orthogonal to Xq have the property: 



The distribution of the values of the projections satisfying the property ^ 
is well known and universal [TOj for the natural images. The same distribution 
seems to be valid for a vast majority of the images. The distribution of the 
projections derived for the X-ray image, shown in Fig. [l] is shown in Fig. [2l 

Roughly speaking, the distribution satisfies a power law distribution in log- 
log scale if the blocks are small enough with exponential drop at the extremes. 
When the blocks are big enough, the exponential part is predominant. 

If Ar and AJ, have similar projections, then they will belong to one and the 
same or to neighbors bins. 

Therefore we can look for blocks that have a minimal number of similar 
and large projections. But these, due to the universality of the distribution, are 
exactly the blocks with large projection values. 

As a first approximation, we can just consider the projections and score the 
points according to the bin they belongs to. The distribution can be described 
by only one parameter that, for convenience, can be chosen to be the standard 
deviation ax of the distribution of X.B. 

The notion of "large value of the projection" will be different for different 
projections but will be always proportional to the standard deviation^ Therefore 
we can define a parameter a and score the blocks with |Ar._B| > aax- 

This procedure consists of the following steps: 

0. Construct a figure B with the same shape as A and with all pixels equal 
to zero. 

1. Generate a random projection operator X, with carrier with shape S, zero 
mean and norm one. 

2. Project all blocks (convolute the figure). We denote the resulting figure as 



3. Calculate the standard derivation ax of the result of the convolution. 

4. For all points of C with absolute values greater than aax, increment the 
corresponding pixel in B. 

Repeat steps 1-4 for M number of times. 

5. Select the maximal values of B as the most singular part of the image. 
The number of iterations M can be fixed empirically or until the changes in 

B, normalized by that number, become insignificant. Following the algorithm, 
one can see that the time to perform it is proportional to MN'^ \ogN . The speed 
per image of size 1024 x 2048 on one and the same computer, with 5, a square 

In general, the standard deviation will be larger for projections with larger low- 
frequency components. That is why we choose the criterion proportional to ox and 
not as absolute value. 




(3) 



p 



C. 
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Fig. 3. Score values for different size of the shape (from left to right: 
24x24,32x32,40x40,56x56). The value of the parameter a in all the cases is 22. 

of size 56 x 56 points, is about 3 seconds compared to about an hour, using the 
direct search implementing the Eq. (l).EI 

Some results are presented in Figs l3l4l where we used square shapes with 
different size, 30 projection operators and different values of a. 

Because the distribution of the projections (Fig. 2) is universal, it is not 
surprising that the algorithm is operational for different images. We have tested 
it with some 100 medical Xray images and the results of the visual inspections 
were goocQ 

It can be noted that the number of projection operators is not critical and can 
be kept relatively low and independent of the size of the block. Note that with 
significantly large blocks, the results can not be regarded as en edge detector. 
This empirical observation is not a trivial result at all, indicating that the degrees 
of freedom are relatively few, even with large enough blocks, something that 
depends on the statistics of the images and can not be stated in general. With 
more than 20 projections we achieve satisfactory results, even for areas with 
more than 3000 pixels. The increment of the number of the projections improves 
the quality, but with more than 30 projection practically no improvement can 
be observed. 

It is possible to look at that algorithm in a different way. Namely, if we 
are trying to reconstruct the figure by using some projection operators Xc (for 
example DCT as in JPEG), then the length of the code, one uses to code a 
component with distribution like Fig. O will be proportional to the logarithm 

^ If the block is small enough, the convolution can be performed even faster in the 
space domain and it is possible to improve the execution time. 

* Some of the images require normalization of the projection with the deviation of the 
block B. 
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Fig. 4. Score values for different parameter a (from left to right: a = 8,10,12,16). 
The size of the shape is 24x24. 

of the probability of some value of the projection Xc-A. Therefore, what we are 
scoring is the block that has some component of the code larger than some length 
in bits (here we ignore the psychometric aspects of the coding). Effectively we 
score the blocks with longer coding, e.g. the ones that have lower probability of 
occurrence. 

Using a smoothed version of the above algorithm in step 4, without adding 

only one or zero, but for example, penalizing the point with the square of the 
projection difference in respect to the current block divided by a, and having in 
mind the universal distribution of the projection, one can compute the penalty 
function as a function of the value of the projection x, that results to be just 
1/2 + x^/2a^. Summing over all projections, we can find that the probability of 
finding the best block is approximated given by 1/2[1 + erfc(M(l/2 + a;^/2cr^))] 
as a consequence of the Central Limit Theorem. The above estimation gives an 
idea why one need few projections to find the rarest block, in sense of the global 
distribution of the blocks, almost independently of the size of the block. The only 
dependence of the size of the blocks is given by cr^ factor, that is proportional 
to its size. Further, the probability of error will drop better than exponentially 
with the increment of M. 

The non-smoothed version performs somewhat better that the above estima- 
tion in the computer experiments. 

2.2 Network 

The pitfall of the consideration in the previous subsection is that the detected 
blocks are rare in absolute sense, e.g. in respect to all figures that satisfy the 
power law or similar distribution of the projections. Actually this is not desirable. 
If for example in X-ray image appear several spinal segments, although these can 
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be rare in the context of all existing images, they are not rare in the context of 
thorax or chest X-ray images. 

Therefore the parts of the images with many similar projections must "can- 
cel" each other. This gives us the idea to build a network, where its components 
with similar projection are connected by a negative feedback corresponding to 
the blocks with similar projections. 

As we have seen in the previous section, the small projection values are much 
more probable and therefore less informative. Using this empirical argument, we 
can suggest that the connections between the blocks with large projections are 
more significant. 

The network is symmetrical by its nature, because of the reflexivity of the 
distances. We can try to build it in a way similar to the Hebb network \2\ and 
define Lyapunov o energy function of the network. Thus the network can be 
described in terms of artificial recursive neural network. Connecting only the 
elements of the image that produce large projections, the network can be build 
extremely sparse [11], which makes it feasible in real cases. 

Let us try to formalize the above considerations. For each point we define 
a neuron. The neurons corresponding to some point r and having projection x 
receive a positive input flux, which is proportional to — logp(a;), where p is the 
probability of having projection with value x. The same element, if its projection 
is large, also receives a negative flux from the points r' with nearest projections 
that satisfy the condition |r — r'\ > diam(S'). The flux in general is a function 
of p{x) and x' — x. 

As a first approximation we assume that the flux is constant with p{x) and 
the dependence on a;' — a; is trivial: the weight is 1 if \x' — x\ < 5 and zero 
otherwise, where 5 is some parameter of the model. 

In other words, we reformulate our problem in terms of a Hebb-like neural 
network with external field 

M 

h = -hQ^\ogp{xi) (4) 

and weights 

M 

Wrr' = - XI 1- (^) 

\xi\ > acTi, 
\x'^\ > a(Ji, 
\x^ - Xi\ < S, x\x.i > 

The extra parameter ho balances between the global and the local effects. It can 
be chosen in a way that the mean fluxes of positive and negative currents are 
equal in the whole network. The parameter S, as a proof of concept value, can 
be assumed to be equal to infinity. So the only parameter, as in the previous 
case, is a. 
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Fig. 5. Comparison between score image (left) and network activity image 
(right). The size of the area is 24x24 and the parameter a ~ 16. 



The dynamics of the network over time t is given by the following equation 



3j.{t + 1) = g{fi[hr + ^ Wrr'Srit) - T]), 



where g[.) is a sigmoid function, Sr{t) is the state of the neuron s at position r 
and time t, (3 is the inverse temperature and T is the threshold of the system. 
The result must be insensitive to the particular chose of (;(.). 

Once the network is constructed, we need to choose its initial state. If the a 
priori probabilities for all points to be the origin of the rarest block are equal, 
one can choose Sr{Q) = 1, Vr. Due to the non-linearity, the analysis of the 
results is not straightforward. The existence of the attractor is guaranteed by 
the symmetrical nature of the weights w, which is a necessary condition for the 
existence of an energy function. 

We can further refine the results of the previous section by fixing the global 
threshold T in a way to have only some fraction of the excited neurons. Thus 
we obtain a bump activity of the network, previously considered in [9l7l8j . A 
sample result is shown in FiglS] 

Regarding the time analysis of the procedure, one can see that the execution 
times are proportional to the number of the weights w. Having in mind that 
actually the connectivity is between the blocks, and that we can use a fraction 
of blocks less than l/N'^, the execution time can drop to order inferior to the 
limit. Thus, the number of steps to achieve the attractor is of the order logN. 



3 Discussion and Future Directions 



In this paper we present a method to find the most unusual (rare) part in two 
and higher dimensional images, when its shape is fixed, but in general arbitrary. 
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The method is almost independent on the size of the shape in terms of the 
execution speed and time. It gives good results on experimental images without 
predefined model of the interesting event. 

One necessary future development of the algorithm is to achieve practical and 
computable criteria of the " rareness" of the block and comparing the results on 
large enough database in order to have qualitative measure of the results. The 
criterion must be different from Eq. (1), because its direct computing tends to 
be very slow and crispy. 

Exact calculus of the probabilistic features of the network in the thermody- 
namics limit, performed in the sense of probability of finding the outliers, are 
also of common interest. 

Among the future applications of the present method, one could mention 
the achievement of experiments on diflferent type of images and large image 
databases and experiments on acceleration of the network due to the special 
equivalence class construction. 
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